Enhanced Feature Extraction Network Based on Acoustic Signal Feature Learning for Bearing Fault Diagnosis

The method of acoustic radiation signal detection not only enables contactless measurement but also provides comprehensive state information during equipment operation. This paper proposes an enhanced feature extraction network (EFEN) for fault diagnosis of rolling bearings based on acoustic signal feature learning. The EFEN network comprises four main components: the data preprocessing module, the information feature selection module (IFSM), the channel attention mechanism module (CAMM), and the convolutional neural network module (CNNM). Firstly, the one-dimensional acoustic signal is transformed into a two-dimensional grayscale image. Then, IFSM utilizes three different-sized convolution filters to process input image data and fuse and assign weights to feature information that can attenuate noise while highlighting effective fault information. Next, a channel attention mechanism module is introduced to assign weights to each channel. Finally, the convolutional neural network (CNN) fault diagnosis module is employed for accurate classification of rolling bearing faults. Experimental results demonstrate that the EFEN network achieves high accuracy in fault diagnosis and effectively detects rolling bearing faults based on acoustic signals. The proposed method achieves an accuracy of 98.52%, surpassing other methods in terms of performance. In comparative analysis of antinoise experiments, the average accuracy remains remarkably high at 96.62%, accompanied by a significantly reduced average iteration time of only 0.25 s. Furthermore, comparative analysis confirms that the proposed algorithm exhibits excellent accuracy and resistance against noise.


Introduction
The operation of mechanical equipment heavily relies on the performance of rolling bearings, which highlights their crucial role.Ensuring the safe functioning of bearings not only minimizes economic losses for enterprises but also prevents potential casualties [1,2].The vibration signal emitted by rolling bearings typically carries valuable information regarding their operational faults, thereby offering a promising avenue for accurate fault diagnosis in mechanical research on fault diagnosis, which has extensively explored this area based on vibration signals.
In fault diagnosis based on vibration signals, He et al. [3] installed vibration sensors on mechanical equipment for detection and proposed a research method for fault diagnosis of rotating machinery.Chen et al. [4] used vibration sensors to detect the long-term health status of wind turbines and established a quantitative index of the damage degree of wind turbine bearing failure.Teng et al. [5] tested the bearing fault of a 2 MW wind turbine and demodulated the collected vibration signals, successfully proving that the cyclic spectrum correlation method can effectively identify the fault characteristics of rolling bearings.Qiao et al. [6] made a fault diagnosis of rotating machinery based on vibration signals and successfully extracted early weak fault characteristics of rolling bearings by using a stochastic resonance method.Miao et al. [7] proposed an improved adaptive variational mode decomposition method and used vibration signals to realize composite fault diagnosis of rotating machinery bearings.
Although the above methods can successfully identify the fault characteristics of bearings, they are all based on vibration signals.The fault diagnosis of vibration signals belongs to contact measurement, and the reasonable installation of a vibration sensor has certain requirements regarding its location.For some complex and precise equipment or for instances when the field sensor installation is limited, higher requirements for the sensor layout are put forward; in these cases, acoustic sensors can be used in a noncontact way to collect the acoustic radiation signal of the equipment [8,9].Additionally, the irregular geometry of certain equipment often hinders direct sensor installation.
Acoustic radiation signals contain rich operating-status information of rolling bearings.In fault diagnosis research based on acoustic signals, some scholars have undertaken the following work: fault classification [10], fault prediction [11], and condition monitoring [12] of rotating machinery based on sound signals.And the integration of deep learning techniques [13][14][15][16] with acoustic radiation signals is specifically emphasized.
For example, Josu'e Pacheco-Ch'errez [17] used three different supervised machine learning methods for comparative analysis to improve the fault prediction accuracy of rotating machinery.Wang et al. [18] proposed a multimodal sensor signal fusion method for acceleration signal and sound signal acquisition, which extracts features from the original vibration signal and acoustic signal, uses a CNN network to fuse them, and finally realizes the fault classification of rolling bearings.In order to solve the bearing fault problem of CNC machine tools, Mohmad lqbal [19] proposed a fault diagnosis method based on the acoustic signal of a convolutional neural network, and the research shows that this method can realize the classification of bearing faults.Eugenio Brusa [20] applied the transfer learning strategy to the fault diagnosis of rolling bearings and verified that the deep learning architecture based on sound signals can realize the fault diagnosis of machines.Bai et al. [21] fused sound and vibration signals to improve the detection accuracy of rolling bearing fault characteristics, which is conducive to the condition monitoring of bearing systems.Zhang et al. [22] focused on the fault diagnosis of offshore wind power equipment using acoustic emission signals and vibration signals.The advantages of acoustic emission sensors include their high sensitivity, precision, and ability to acquire large amounts of data.However, they are limited in their application to specific fields for nondestructive testing and lack wide-ranging applicability.
Although the aforementioned studies made certain advancements in deep learning, their utilization of convolution kernel size is overly simplistic and predominantly relies on single-channel diagnosis, thereby limiting the effective extraction of more comprehensive fault feature information.Moreover, the sound signal is seriously disturbed by background noise, so the question of how to filter the collected sound signal is very important.For complex equipment in the process of operation, the collected signal contains a lot of interference information and shows strong nonstationarity.If the collected original signal is directly input into the neural network, the network will learn many invalid features, resulting in a reduction in classification accuracy.Therefore, the question of how to effectively use sound signal and a deep learning algorithm to achieve fault diagnosis of rolling bearings is very important.
Based on the above research, this paper presents a new method of rolling bearing fault diagnosis based on a multichannel acoustic array signal and multinetwork module combination.The main work of this paper is as follows: (1) An IFSM is developed, which utilizes three convolutional filters of varying sizes to process input image data.
(2) The CAMM is constructed, and it is utilized to assign weights to all branch channels, thereby achieving the refinement of fault information.
(3) The research on fault diagnosis method for rolling bearings based on sound signals is accomplished by constructing a deep learning network framework that integrates the IFSM, CAMM, and CNNM.
The rest of this article is described below.In Section 2, the proposed theoretical method is introduced in detail.In Section 3, experiments are used to verify the effectiveness of the proposed method.Finally, the conclusion of this paper is introduced.

Multichannel Acoustic Array Data
Compared with the single microphone sensor, the microphone array adopts multiple acoustic sensors to collect data on the running state of the rolling bearing.The ring array can collect the characteristics of the bearing's circumferential and radial running state.The schematic diagram of the acoustic array measurement points is shown in Figure 1.The fault characteristic information of different bearing types can be collected from multiple angles by installing multiple acoustic sensors on the ring disc to improve the fault diagnosis and identification accuracy of rolling bearings.(2) The CAMM is constructed, and it is utilized to assign weights to all branch channels, thereby achieving the refinement of fault information.
(3) The research on fault diagnosis method for rolling bearings based on sound signals is accomplished by constructing a deep learning network framework that integrates the IFSM, CAMM, and CNNM.
The rest of this article is described below.In Section 2, the proposed theoretical method is introduced in detail.In Section 3, experiments are used to verify the effectiveness of the proposed method.Finally, the conclusion of this paper is introduced.

Multichannel Acoustic Array Data
Compared with the single microphone sensor, the microphone array adopts multiple acoustic sensors to collect data on the running state of the rolling bearing.The ring array can collect the characteristics of the bearing's circumferential and radial running state.The schematic diagram of the acoustic array measurement points is shown in Figure 1.The fault characteristic information of different bearing types can be collected from multiple angles by installing multiple acoustic sensors on the ring disc to improve the fault diagnosis and identification accuracy of rolling bearings.Multiple ring array sensors were used for data acquisition, and each data acquisition channel was arranged in parallel to construct a two-dimensional spatial data matrix, and each independent channel also contained rich fault feature information of rolling bearings.In order to extract effective fault feature information at a deeper level, this paper proposed an information feature selection module.And because each channel is both independent and interrelated, it is therefore very important to assign weight to the fault information contained in each channel.This paper uses the channel a ention mechanism module to achieve this function.

Data Preprocessing
Since the two-dimensional image signal contains higher-dimensional fault feature information, this paper converts the collected one-dimensional sound signal into a two-dimensional gray image for feature input.Two-dimensional images provide more abundant fault feature information, which can improve the sample quality of model training.Multiple ring array sensors were used for data acquisition, and each data acquisition channel was arranged in parallel to construct a two-dimensional spatial data matrix, and each independent channel also contained rich fault feature information of rolling bearings.In order to extract effective fault feature information at a deeper level, this paper proposed an information feature selection module.And because each channel is both independent and interrelated, it is therefore very important to assign weight to the fault information contained in each channel.This paper uses the channel attention mechanism module to achieve this function.

Data Preprocessing
Since the two-dimensional image signal contains higher-dimensional fault feature information, this paper converts the collected one-dimensional sound signal into a two-dimensional gray image for feature input.Two-dimensional images provide more abundant fault feature information, which can improve the sample quality of model training.
Firstly, the collected sound signals are normalized.The value range of the normalized data is 1 to 0, corresponding to the change in brightness and darkness of the gray value in the grayscale image, and its mathematical expression is as follows: where x represents the input signal, and x min and x max represent the minimum and maximum values of signal x, respectively.The output of an RGB image does not generate redundant features; thus, gray images convert the three channels into one channel without any additional operations, resulting in a significant reduction in computational requirements.Then, the collected one-dimensional data signal is reconstructed to form a gray image.In order to optimize the efficiency of network training, it is advisable to limit the size of the fully connected layer structure.This study employs a sampling point length of 1024 for the intercepted onedimensional signal.Additionally, for computational convenience, the input gray image is designed with equal length and height.Consequently, the one-dimensional signal is evenly divided into 32 segments on average, each segment having a length of 32.Every 32 data points form a column of the grayscale map, and 32 segments of data are stacked with 32 columns and finally reconstructed into 32 × 32 two-dimensional grayscale images.The detailed operation is shown in Figure 2.
Sensors 2023, 23, x FOR PEER REVIEW 4 of 18 Firstly, the collected sound signals are normalized.The value range of the normalized data is 1 to 0, corresponding to the change in brightness and darkness of the gray value in the grayscale image, and its mathematical expression is as follows: where x represents the input signal, and xmin and xmax represent the minimum and maximum values of signal x, respectively.The output of an RGB image does not generate redundant features; thus, gray images convert the three channels into one channel without any additional operations, resulting in a significant reduction in computational requirements.Then, the collected one-dimensional data signal is reconstructed to form a gray image.In order to optimize the efficiency of network training, it is advisable to limit the size of the fully connected layer structure.This study employs a sampling point length of 1024 for the intercepted one-dimensional signal.Additionally, for computational convenience, the input gray image is designed with equal length and height.Consequently, the one-dimensional signal is evenly divided into 32 segments on average, each segment having a length of 32.Every 32 data points form a column of the grayscale map, and 32 segments of data are stacked with 32 columns and finally reconstructed into 32 × 32 twodimensional grayscale images.The detailed operation is shown in Figure 2.

IFSM
The utilization of smaller convolution kernels enables the extraction of more localized features, whereas larger convolution kernels facilitate the extraction of more global features.Generally, incorporating multiple convolution cores of varying sizes can enhance network performance [23,24].To enhance fault information feature extraction capability, three distinct convolution kernels are selected.The IFSM is shown in Figure 3. Three convolution kernels of different scales are used to carry out convolution operations, and the results of the operations are spliced to obtain signals:

IFSM
The utilization of smaller convolution kernels enables the extraction of more localized features, whereas larger convolution kernels facilitate the extraction of more global features.Generally, incorporating multiple convolution cores of varying sizes can enhance network performance [23,24].To enhance fault information feature extraction capability, three distinct convolution kernels are selected.The IFSM is shown in Figure 3. Three convolution kernels of different scales are used to carry out convolution operations, and the results of the operations are spliced to obtain signals: where x c indicates the L × L dimension number x of c channels.conv 1 , conv 2 , and conv 3 represent convolution operations with convolution kernel sizes of 3 × 3, 5 × 5, and 7 × 7, respectively.And concat represents the feature concatenation operation.Then, x conv c is then fused by a convolution kernel of size 1 × 1 to obtain x conv .Finally, the feature weight of input signal x is obtained by sigmoid activation function: where σ(x) = 1/(1 + e −x ) represents sigmoid activation function, and conv 4 stands for convolution.Finally, after processing by IFSM module, output signal x IFSM c is obtained: Sensors 2023, 23, x FOR PEER REVIEW 5 of 18 where xc indicates the L × L dimension number x of c channels.conv1, conv2, and conv3 represent convolution operations with convolution kernel sizes of 3 × 3, 5 × 5, and 7 × 7, respectively.And concat represents the feature concatenation operation.Then, conv c x is then fused by a convolution kernel of size 1 × 1 to obtain conv x .Finally, the feature weight of input signal x is obtained by sigmoid activation function:

CAMM
Considering that different channels contain different contribution degrees of faultcharacteristic information, the channel a ention mechanism is used to assign different weight values to each channel [25,26].The specific operation process is shown in Figure 4. Firstly, the input signals IF SM   c   x are processed by global average pooling and global maximum pooling, respectively.The acoustic signals of each channel are compressed and the fault features of each channel are compressed into a global feature.Then, the generated two feature maps are fed into the multilayer perceptron (MLP) with shared weights for interchannel learning, and the dimensionality between the two neural layers is reduced by compression ratio.Finally, the MLP output features are added and activated by sigmoid function to generate channel weight ωc and calculate the output features according to element multiplication.The expression is defined as follows:

CAMM
Considering that different channels contain different contribution degrees of faultcharacteristic information, the channel attention mechanism is used to assign different weight values to each channel [25,26].The specific operation process is shown in Figure 4. Firstly, the input signals x IFSM c are processed by global average pooling and global maximum pooling, respectively.The acoustic signals of each channel are compressed and the fault features of each channel are compressed into a global feature.Then, the generated two feature maps are fed into the multilayer perceptron (MLP) with shared weights for interchannel learning, and the dimensionality between the two neural layers is reduced by compression ratio.Finally, the MLP output features are added and activated by sigmoid function to generate channel weight ω c and calculate the output features according to element multiplication.The expression is defined as follows: )) (5)

CNNM
The fault diagnosis of rolling bearings is achieved through the CNNM network af successfully completing IFSM and CAMM.The classical structure of CNN is shown Figure 5, which is mainly composed of input layer, convolution layer, pooling layer, a vation function layer, and full connection layer.(1) Convolutional layer.
The 2D convolution operation is defined as follows: where 2 ( ) g l represents the features extracted from the lth convolution kernel; , (2) Pooling layer.

CNNM
The fault diagnosis of rolling bearings is achieved through the CNNM network after successfully completing IFSM and CAMM.The classical structure of CNN is shown in Figure 5, which is mainly composed of input layer, convolution layer, pooling layer, activation function layer, and full connection layer.

CNNM
The fault diagnosis of rolling bearings is achieved through the CNNM network aft successfully completing IFSM and CAMM.The classical structure of CNN is shown Figure 5, which is mainly composed of input layer, convolution layer, pooling layer, ac vation function layer, and full connection layer.(1) Convolutional layer.
The 2D convolution operation is defined as follows: where g 2 (l) represents the features extracted from the lth convolution kernel; ω l i,j represents the weight coefficient; b l stands for the deviation coefficient; and m and n indicate the size of the input information.
After the convolution operation, the linear rectification function (ReLU) is used to carry out nonlinear transformation of the obtained data results.The formula is as follows: Sensors 2023, 23, 8703 The pooling layer is equivalent to downsampling, which compresses the input information features, thus speeding up the operation speed of the neural network.This paper adopts the maximum pooling algorithm, which is defined as follows: where N represents the size of the convolution kernel and l represents the lth pooling kernel.
The fully connected layer integrates the features extracted by the previous layer network and maps these features to the sample label space.The fully connected layer weights and sums the output features of the previous layer and inputs the results into the activation function to complete the classification of the target.
where w f c is the weight of the fully connected layer and f (•) is the softmax activation function.
In this study, EFENet is an end-to-end model that relies on a backpropagation algorithm to update parameters during training.The loss function of EFENet is the crossentropy loss function: where y i is the actual label and y pre i is the predicted label.The present study proposes a feature-enhanced deep learning network and utilizes acoustic array signals for the purpose of fault diagnosis in rolling bearings, based on the aforementioned research foundation.The overall logic block diagram is illustrated in Figure 6, with the specific steps outlined as follows: Step 1: acquire multichannel data of rolling bearings using an acoustic array sensor; Step 2: convert one-dimensional data into a two-dimensional grayscale image; Step 3: feed the 2D grayscale images generated by each channel into the IFSM module for deep learning; Step 4: apply the channel attention mechanism module to weight all channels; Step 5: input the fused data into the CNNM module for fault feature extraction; Step 6: obtain classification results as output.

Experiment Introduction
In this study, in order to verify the feasibility of the proposed method, experimental data were obtained from the rolling bearing test bench.The general technical route is

Experiment Introduction
In this study, in order to verify the feasibility of the proposed method, experimental data were obtained from the rolling bearing test bench.The general technical route is shown in Figure 6.The experimental test system included the motor, controller, rotating shaft, experimental bearing, acoustic array sensor, and data collector, as shown in Figure 7. Sixteen acoustic sensors are mounted on a circular disk in the form of a ring array.The sensor model number is BSWA MPA416, and the sensor sensitivity is 50 mV/Pa; according to the sampling theorem, the sampling frequency of each channel is set to 16,384 Hz. the attenuation of the signal becomes more pronounced when it is positioned 500 mm away from the transmitting source.To facilitate the acquisition of the acoustic array signal, the acoustic array sensor is installed on the plane 200 mm away from the bearing under test; the center point of the sensor is in a straight line with the center of the bearing under test.The status information of the bearing during operation is collected by the acoustic array sensor, and the collected acoustic signals are transmitted to the data acquisition system.The acoustic array sensor is shown in Figure 8a.The model of the data collector is PAK MKII-SC42, as shown in Figure 8b.

Experiment Introduction
In this study, in order to verify the feasibility of the proposed method, experimental data were obtained from the rolling bearing test bench.The general technical route is shown in Figure 6.The experimental test system included the motor, controller, rotating shaft, experimental bearing, acoustic array sensor, and data collector, as shown in Figure 7. Sixteen acoustic sensors are mounted on a circular disk in the form of a ring array.The sensor model number is BSWA MPA416, and the sensor sensitivity is 50 mV/Pa; according to the sampling theorem, the sampling frequency of each channel is set to 16,384 Hz. the a enuation of the signal becomes more pronounced when it is positioned 500 mm away from the transmi ing source.To facilitate the acquisition of the acoustic array signal, the acoustic array sensor is installed on the plane 200 mm away from the bearing under test; the center point of the sensor is in a straight line with the center of the bearing under test.The status information of the bearing during operation is collected by the acoustic array sensor, and the collected acoustic signals are transmi ed to the data acquisition system.The acoustic array sensor is shown in Figure 8a.The model of the data collector is PAK MKII-SC42, as shown in Figure 8b.The running state of rolling bearings mainly includes the following: normal, inner ring failure, outer ring failure, rolling element failure, and cage failure.Bearing assemblies with different faults were tested on the experimental bench.The geometric parameters of the experimental bearings are shown in Table 1.The running state of rolling bearings mainly includes the following: normal, inner ring failure, outer ring failure, rolling element failure, and cage failure.Bearing assemblies with different faults were tested on the experimental bench.The geometric parameters of the experimental bearings are shown in Table 1.The different failure components are shown in Figure 9.
The running state of rolling bearings mainly includes the following: norm ring failure, outer ring failure, rolling element failure, and cage failure.Bearing as with different faults were tested on the experimental bench.The geometric param the experimental bearings are shown in Table 1.The different failure components are shown in Figure 9.In the process of dataset processing, the failures can be categorized into seve which include inner failure, outer failure, rolling element failure, cage failure, a pling failure.Each fault type contains 500 samples, of which 80% of the dataset is network training, while the remaining 20% of the dataset is used for testing and tion.The details of the dataset are shown in Table 2.The fault types are categoriz numerical labels ranging from C0 to C6.In addition, the tenfold crossover met used for comparative analysis in the experiments.The length of the data sample The motor speed is 1200 rpm.In the process of dataset processing, the failures can be categorized into seven types, which include inner failure, outer failure, rolling element failure, cage failure, and coupling failure.Each fault type contains 500 samples, of which 80% of the dataset is used for network training, while the remaining 20% of the dataset is used for testing and verification.The details of the dataset are shown in Table 2.The fault types are categorized with numerical labels ranging from C0 to C6.In addition, the tenfold crossover method was used for comparative analysis in the experiments.The length of the data sample is 1024.The motor speed is 1200 rpm.The judicious selection of batch size and learning rate not only yields a relatively robust model but also significantly reduces computational overhead.The model optimizer utilizes the Adam algorithm with a learning rate of 0.    In the process of neural network training, in order to improve the generalization ability of the network, we use batch normalization (BN) technology.BN is usually placed between the convolutional layer and the pooling layer.In order to prevent overfi ing during training, the dropout layer is introduced, and the value of dropout in this paper is set to 0.5.The computer is configured as follows: 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60 GHz, RAM 48 G, NVIDIA GeForce GTX 3070 GPU and made in Texas, USA.EFENet is built under Torch-gpu 1.11.0 based on Python 3.10 [27].Table 3 shows the detailed information of the proposed algorithm framework.In the process of neural network training, in order to improve the generalization ability of the network, we use batch normalization (BN) technology.BN is usually placed between the convolutional layer and the pooling layer.In order to prevent overfitting during training, the dropout layer is introduced, and the value of dropout in this paper is set to 0.5.The computer is configured as follows: 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60 GHz, RAM 48 G, NVIDIA GeForce GTX 3070 GPU and made in Texas, USA.EFENet is built under Torch-gpu 1.11.0 based on Python 3.10 [27].Table 3 shows the detailed information of the proposed algorithm framework.

Analysis Results
The training and testing process and loss value of the proposed algorithm are shown in Figure 11.It can be seen from Figure 11 that the training accuracy of the proposed algorithm reaches convergence when it iterates about 18 times.The final accuracy of the test set is maintained at about 98%.The loss value of the model decreases rapidly with the increase in the number of iterations and finally remains stable.From the analysis results, it can be concluded that the method proposed in this paper can effectively propose the fault characteristics of rolling bearings from the acoustic array signal.In order to further analyze the difficulty of feature extraction of each type of sound fault signal, the confusion matrix of the test set is shown in Figure 12.The classification accuracy of C0 and C6 fault data is poor, but it is also maintained at 95%.The diagnostic accuracy of other categories is above 98%, which indicates that the proposed method can effectively diagnose the running state of rolling bearings under different fault

Analysis Results
The training and testing process and loss value of the proposed algorithm are shown in Figure 11.It can be seen from Figure 11 that the training accuracy of the proposed algorithm reaches convergence when it iterates about 18 times.The final accuracy of the test set is maintained at about 98%.The loss value of the model decreases rapidly with the increase in the number of iterations and finally remains stable.From the analysis results, it can be concluded that the method proposed in this paper can effectively propose the fault characteristics of rolling bearings from the acoustic array signal.In order to further analyze the difficulty of feature extraction of each type of sound fault signal, the confusion matrix of the test set is shown in Figure 12.The classification accuracy of C0 and C6 fault data is poor, but it is also maintained at 95%.The diagnostic accuracy of other categories is above 98%, which indicates that the proposed method can effectively diagnose the running state of rolling bearings under different fault conditions.In order to understand the details of data processing in each process of the model, a t-distributed stochastic neighbor embedding algorithm (t-SNE) is used to process the initial input stage, information feature selection, channel a ention mechanism, and convolutional neural network of various fault data.The results are shown in Figure 14.It can be seen from Figure 14a that when the data first entered the model, the distribution was irregular and random.In Figure 14b, the feature distribution gradually became regular, and data of different categories began to gather within the class.In Figure 14c, the spacing In order to understand the details of data processing in each process of the model, a t-distributed stochastic neighbor embedding algorithm (t-SNE) is used to process the initial input stage, information feature selection, channel attention mechanism, and convolutional neural network of various fault data.The results are shown in Figure 14.It can be seen from Figure 14a that when the data first entered the model, the distribution was irregular and random.In Figure 14b, the feature distribution gradually became regular, and data of different categories began to gather within the class.In Figure 14c, the spacing within the class gradually increased to achieve effective classification results.The classification effects of various faults are clearly discernible in Figure 14d, ultimately.

Comparison of Results with Other Methods
In order to verify the fault diagnosis performance of the proposed algorithm, the proposed method is compared with a residual convolutional autoencoder (RCAE) [23], sparse autoencoders (SAE), CNN, DenseNet, and ResNet.Detailed parameters of the other five methods are shown in Table 4.The classifier structure is the same as the EFENet network

Comparison of Results with Other Methods
In order to verify the fault diagnosis performance of the proposed algorithm, the proposed method is compared with a residual convolutional autoencoder (RCAE) [23], sparse autoencoders (SAE), CNN, DenseNet, and ResNet.Detailed parameters of the other five methods are shown in Table 4.The classifier structure is the same as the EFENet network structure "2048-1024-256-7".The RCAE architecture comprises two conv-pool layers and one deconv layer.The CNN network is composed of three conv-pool layers.The ResNet and DenseNet networks consist of a residual module and a dense connection block module (den), respectively.The convolutional kernel has a size of 3 and a stride of 1, while the pooling operation utilizes maximum pooling with a stride of 2. The batch size is set to 36 and the learning rate to 0.005.The comparison of the results of five different methods' operations are shown in Figure 15.It is evident from the figure that each method exhibits an average classification accuracy of 98.52%, 97.36%, 93.58%, 82.36%, and 96.42%, respectively, with the SAE method demonstrating the lowest diagnostic accuracy.Notably, our proposed method surpasses CNN by a margin of 7% and ResNet by a margin of 4%.These comparative findings highlight the superior fault diagnosis performance achieved by our proposed approach, further validating the effectiveness of the information feature selection module and channel attention mechanism module in extracting bearing fault features from acoustic array signals.
F1-Score = 2 × (Precision × Recall)/(Precision + Recall) (16) where TP represents true positive, which refers to the number of correctly predicted positive samples.FP represents false positive, indicating the number of incorrectly predicted positive samples.TN stands for true negative, denoting the number of accurately predicted negative samples.FN signifies false negative, representing the number of erroneously predicted negative samples.The comparison results of the five methods are shown in Table 5.The values of each index of the proposed method, as shown in Table 5, surpass those of other methods, thereby further substantiating the accuracy of the proposed method.

Antinoise Experimental Analysis
In order to further evaluate the noise absorption capability of the algorithm proposed in this paper, 2 dB white Gaussian noise was added to the collected acoustic array signal sample, and the SNR formula is defined as follows: SNR = 10lg(P signal /P noise ) The three methods, namely, RCAE, DenseNet, and ResNet, which exhibited relatively good diagnostic performance in the previous section were selected and compared with the approaches proposed in this paper.The comparative results are presented in Table 6.These findings demonstrate that EFENet achieves a fault diagnosis accuracy as high as 96.62%, greater than the other three methods and indicating its excellent antinoise capability and superior robustness.Furthermore, the average iteration time of the EFENet model is recorded at 0.25 s, highlighting its computational efficiency suitable for acoustic-signalbased rolling bearing fault diagnosis applications.The t-SNE clustering results of the four methods are presented in Figure 16.It is evident from the figure that the fault clustering performance of the ResNet method is subpar, as all seven types of fault data appear to be intertwined without effective separation.While both the DenseNet and RCAE methods exhibit a relatively satisfactory ability to separate most of the fault data, there still exist instances of misclassification and indistinct boundaries.In contrast, our proposed method demonstrates superior classification effectiveness with distinct boundaries and substantial interclass spacing, further substantiating its superiority.The t-SNE clustering results of the four methods are presented in Figure 16.It is evi dent from the figure that the fault clustering performance of the ResNet method is subpar as all seven types of fault data appear to be intertwined without effective separation While both the DenseNet and RCAE methods exhibit a relatively satisfactory ability to separate most of the fault data, there still exist instances of misclassification and indistinc boundaries.In contrast, our proposed method demonstrates superior classification effec tiveness with distinct boundaries and substantial interclass spacing, further substantiating its superiority.The accuracy curve during the training process of the four test sets is depicted in Figure 17.It can be observed from the figure that the RCAE method exhibits the most rapid convergence speed.However, this method's accuracy is unstable, averaging 95%.The ResNet and DenseNet methods demonstrate relatively lower accuracies, reaching approximately 94% and 91%, respectively.In comparison to these approaches, our proposed method showcases a relatively fast convergence speed with a peak accuracy rate of 97%.Furthermore, our proposed method demonstrates stable convergence accuracy, providing further evidence for its ability to suppress noise and achieve high robustness under noisy conditions.The accuracy curve during the training process of the four test sets is depicted in Figure 17.It can be observed from the figure that the RCAE method exhibits the most rapid convergence speed.However, this method's accuracy is unstable, averaging 95%.The ResNet and DenseNet methods demonstrate relatively lower accuracies, reaching approximately 94% and 91%, respectively.In comparison to these approaches, our proposed method showcases a relatively fast convergence speed with a peak accuracy rate of 97%.Furthermore, our proposed method demonstrates stable convergence accuracy, providing further evidence for its ability to suppress noise and achieve high robustness under noisy conditions.

RCAE
The Equations ( 12)-( 16) indicators were retained for the analysis of all methods, and the resulting findings are presented in Table 7.As can be seen from Table 7, the values of each index of the proposed method are greater than those of the other methods, which once again proves that the proposed method has good antinoise ability.

Figure 1 .
Figure 1.Schematic diagram of acoustic array measuring points.

Figure 1 .
Figure 1.Schematic diagram of acoustic array measuring points.

Figure 2 .
Figure 2. Two-dimensional grayscale map after reshaping process of acoustical signal.

Figure 2 .
Figure 2. Two-dimensional grayscale map after reshaping process of acoustical signal.
) where σ(•) represents sigmoid activation function, and x IFSM c indicates the signal processed by IFSM.x CAMM c is the output signal after CAMM.AvgPool and MaxPool represent average and maximum pooling operations, respectively.
Sensors 2023, 23, 8703 6 of 18 Sensors 2023, 23, x FOR PEER REVIEW 6 o where    σ represents sigmoid activation function, and IF SM c x indicates the signal p cessed by IFSM.C AM M c x is the output signal after CAMM.AvgPool and MaxPool repres average and maximum pooling operations, respectively.
coefficient; b l stands for the deviation coefficient; and m and n indic the size of the input information.
Sensors 2023, 23, x FOR PEER REVIEW 6 of where    σ represents sigmoid activation function, and IF SM c x indicates the signal pr cessed by IFSM.C AM M c x is the output signal after CAMM.AvgPool and MaxPool represe average and maximum pooling operations, respectively.
the features extracted from the lth convolution kernel; , coefficient; b l stands for the deviation coefficient; and m and n indica the size of the input information.

Figure 9 .
Figure 9.The different failure components.
The judicious selection of batch size and learning only yields a relatively robust model but also significantly reduces computation head.The model optimizer utilizes the Adam algorithm with a learning rate of 0 epochs set to 100.Both training and testing sample batch sizes are set to 36, whi parameters are updated automatically through backpropagation.Taking measuri 1 as an example, the time-domain signals of seven different fault types are shown i 10.

Figure 9 .
Figure 9.The different failure components.
001 and epochs set to 100.Both training and testing sample batch sizes are set to 36, while model parameters are updated automatically through backpropagation.Taking measuring point 1 as an example, the time-domain signals of seven different fault types are shown in Figure 10.

Figure 10 .
Figure 10.The time-domain signals of 7 different fault types.

Figure 10 .
Figure 10.The time-domain signals of 7 different fault types.

Figure 11 . 11 . 18 Figure 12 .
Figure 11.Loss and accuracy of the EFENet training process.11.Loss and accuracy of the EFENet training process.Sensors 2023, 23, x FOR PEER REVIEW 12 of 18

Figure 13
Figure 13 describes the weight proportion of seven types of faults under different channels.It can be clearly seen from Figure 10 that different acoustic array channels have different weight contribution values.Channel 3 and channel 14 have the largest weight contribution values, with a weight coefficient of 0.8, while the weight contribution values of other channels are relatively low.It is further proved that it is necessary to apply the channel a ention mechanism module.

Figure 12 .
Figure 12.The confusion matrix results for EFENet.

Figure 13
Figure 13 describes the weight proportion of seven types of faults under different channels.It can be clearly seen from Figure 10 that different acoustic array channels have different weight contribution values.Channel 3 and channel 14 have the largest weight contribution values, with a weight coefficient of 0.8, while the weight contribution values

Figure 12 .
Figure 12.The confusion matrix results for EFENet.

Figure 13
Figure 13 describes the weight proportion of seven types of faults under different channels.It can be clearly seen from Figure 10 that different acoustic array channels have different weight contribution values.Channel 3 and channel 14 have the largest weight contribution values, with a weight coefficient of 0.8, while the weight contribution values of other channels are relatively low.It is further proved that it is necessary to apply the channel a ention mechanism module.

Figure 13 .
Figure 13.The weight values of different faults on different channels.

Figure 13 .
Figure 13.The weight values of different faults on different channels.

Table 1 .
Structural parameters of rolling bearings.

Table 1 .
Structural parameters of rolling bearings.

Table 1 .
Structural parameters of rolling bearings.

Table 2 .
Setup of the fault dataset of bearings.

Table 2 .
Setup of the fault dataset of bearings.

Table 4 .
Structure and parameter setting of comparison methods.

Table 4 .
Structure and parameter se ing of comparison methods.

Table 5 .
Comparative results of different evaluation indexes.

Table 6 .
Results of four models based on fivefold cross-validation.